First, we need to know how far back to go. We want equal number of days before and after November 18, 2014.
In [1]:
from datetime import datetime, timedelta
pivot = datetime.strptime('11/18/2014', '%m/%d/%Y')
today = datetime.strptime('1/18/2016', '%m/%d/%Y')
print today - pivot
In [2]:
period = timedelta(days=426)
print pivot - period
This was run on 1/26/2016, 434 days after November 18, 2014. But the Data Portal only has data up to 1/18/2016, so we want to go to 426 days before the pivot, which works out to 9/18/2013.
Now let's download the crime data for processing. We've already filtered it on the Data Portal side to make sure it only contains a) the date range we want, b) the UCRs we're interested in, and c) the beats we're interested in.
In [3]:
import pandas as pd
url = 'https://data.cityofchicago.org/api/views/qa42-2iy9/rows.csv?accessType=DOWNLOAD'
frame = pd.read_csv(url, parse_dates=['Date'])
print frame.head(2)
print '%d crimes found' % len(frame)
Before we do anything, we should add a column for just date so we can do statistics at the level of individual dates, which is kinda what we're after.
In [4]:
frame['Date Only'] = pd.to_datetime(frame['Date'].apply(lambda x: x.date()))
pivot = pivot.date()
Our first question: were there more crimes of all types committed before or after the pivot?
In [9]:
print '%d crimes on or after %s' % (frame[frame['Date Only'] >= pivot].Date.count(), pivot)
In [10]:
print '%d crimes before %s' % (frame[frame['Date Only'] < pivot].Date.count(), pivot)
Now let's graph it.
In [7]:
# Let's get nicer-looking plots. Can't use ggplot because my version of matplotlib is too old (I think).
pd.set_option('display.mpl_style', 'default')
pd.set_option('display.width', 10000)
pd.set_option('display.max_columns', 60)
# We need to specifically ask matplotlib to display plots inline
%matplotlib inline
import matplotlib.pyplot as plt
frame.groupby('Date Only').count().plot(legend=None)
Out[7]:
Last question (for now): Did any particular type of crime spike before or after the pivot?
In [8]:
for ucr in frame['IUCR'].unique():
print ucr
ucr_frame = frame[frame['IUCR'] == ucr]
print '%d crimes on or after %s' % (ucr_frame[ucr_frame['Date Only'] >= pivot].Date.count(), pivot)
print '%d crimes before %s' % (ucr_frame[ucr_frame['Date Only'] < pivot].Date.count(), pivot)
print '---'